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Abstract — A method for efficiently constructing polar codes is 
presented and analyzed. Although polar codes are explicitly de- 
fined, straightforward construction is intractable since the result- 
ing polar bit-channels have an output alphabet that grows expo- 
nentially with the code length. Thus the core problem that needs to 
be solved is that of faithfully approximating a bit-channel with an 
intractably large alphabet by another channel having a manage- 
able alphabet size. We devise two approximation methods which 
"sandwich" the original bit-channel between a degraded and an 
upgraded version thereof. Both approximations can be efficiently 
computed, and turn out to be extremely close in practice. We also 
provide theoretical analysis of our construction algorithms, prov- 
ing that for any fixed £ > and all sufficiently large code lengths n, 
polar codes whose rate is within e of channel capacity can be con- 
structed in time and space that are both linear in n. 

Index Terms — channel polarization, channel degrading and up- 
grading, construction algorithms, polar codes 

I. Introduction 

POLAR codes, invented by Arikan pl, achieve the capac- 
ity of arbitrary binary-input symmetric DMCs. Moreover, 
they have low encoding and decoding complexity and an exp- 
licit construction. Following Arikan's seminal paper |j3j, his 
results have been extended in a variety of important ways. In 
^.\, polar codes have been generalized to symmetric DMCs 
with non-binary input alphabet. In y4|, the polarization phe- 
nomenon has been studied for arbitrary kernel matrices, rather 
than Arikan's original 2x2 polarization kernel, and error ex- 
ponents were derived for each such kernel. It was shown in [l24| 
that, under list-decoding, polar codes can achieve remarkably 
good performance at short code lengths. In terms of applica- 
tions, polar coding has been used with great success in the con- 
text of multiple-access channels |2l[23), wiretap channels p6) , 
data compression |1,4|, write-once channels |6|, and channels 
with memory |21J . In this paper, however, we will restrict our 
attention to the original setting introduced by Arikan in 13). 
Namely, we focus on binary-input, discrete, memoryless, sym- 
metric channels, with the standard 2x2 polarization kernel un- 
der standard successive cancellation decoding. 

Although the construction of polar codes is explicit, there is 
only one known instance — namely, the binary erasure channel 
(BEC) — where the construction is also efficient. A first attempt 
at an efficient construction of polar codes in the general case 
was made by Mori and Tanaka | ,17„18J . Specifically, it is shown 
in | [T7| that a key step in the construction of polar bit-channels 
can be viewed as an instance of density evolution pO). Based 
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on this observation, Mori and Tanaka p8) proposed a construc- 
tion algorithm utilizing convolutions, and proved that the num- 
ber of convolutions needed scales linearly with the code length. 
However, as indeed noted in |[T7), it is not clear how one would 
implement such convolutions to be sufficiently precise on one 
hand while being tractable on the other hand. 

In this paper, we further extend the ideas of p7l[T8| . An ex- 
act implementation of the convolutions discussed in f T7p8[ 
implies an algorithm with memory requirements that grow ex- 
ponentially with the code length. It is thus impractical. Alter- 
natively, one could use quantization (binning) to try and re- 
duce the memory requirements. However, for such quantization 
scheme to be of interest, it must satisfy two conditions. First, it 
must be fast enough, which usually translates into a rather small 
number of quantization levels (bins). Second, after the calcula- 
tions have been carried out, we must be able to interpret them in 
a precise manner. That is, the quantization operation introduces 
inherent inaccuracy into the computation, which we should be 
able to account for so as to ultimately make a precise statement. 

Our aim in this paper is to provide a method by which po- 
lar codes can be efficiently constructed. Our main contribu- 
tion consists of two approximation methods. In both methods, 
the memory limitations are specified, and not exceeded. One 
method is used to get a lower bound on the probability of er- 
ror of each polar bit-channel while the other is used to obtain 
an upper bound. The quantization used to derive a lower bound 
on the probability of error is called a degrading quantization, 
while the other is called an upgrading quantization. Both quan- 
tizations transform the "current channel" into a new one with a 
smaller output alphabet. The degrading quantization results in a 
channel degraded with respect to the original one, while the up- 
grading quantization results in a channel such that the original 
channel is degraded with respect to it. 

The fidelity of both degrading and upgrading approximations 
is a function of a parameter }i, which can be freely set to an 
arbitrary integer value. Generally speaking, the larger ji is the 
better the approximation. The running time needed in order to 
approximate all n polar bit-channels is 0(m ■ jp- log }i). 

Our results relate to both theory and practice of polar codes. 
In practice, it turns out that the degrading and upgrading ap- 
proximations are typically very close, even for relatively small 
values of the fidelity parameter }i. This is illustrated in what 
follows with the help of two examples. 

Example 1. Consider a polar code of length n = TP for the bi- 
nary symmetric channel (BSC) with crossover probability 0.11. 
Let >Vo/ y^\, ■ ■ ■ / y^n-l be the corresponding bit-channels (see 
the next section for a rigorous definition of a bit-channel). The 
basic task in the construction of polar codes is that of classifying 
bit-channels into those that are "good" and those that are "bad." 
Let Pf,{Wi) denote the probability of error on the f-th bit-chan- 
nel (see ( pjj i for a precise definition of this quantity) for i = 
0, 1, . . . , n — 1. We arbitrarily choose a threshold of 10^^ and 
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Fig. 1. Upper and lower bounds on the bit-channel probabilities of error for a polai' code of length n = 1, 048,576 on BSC(O.ll), computed using degrading and 
upgrading algorithms with fi — 256. Only those 132 bit-channels for which the gap between the upper and lower bounds crosses the 10"' threshold are shown. 




R = k/n R = k/n 

(a) Binary symmetric channel BSC (0.001) (b) binary-input AWGN channel with Es/Nq = 

5.00 dB 



Fig. 2. Upper and lower bounds on Piiv,n C^) as a function of rate R = k/n, for two underlying channels and two code lengths n = 2^'^ and n = 2^". The upper 
bound is dashed while the lower bound is solid. For both channels, the difference between the bounds can only be discerned in the plot coiresponding to n = 2^". 



say that the f-th bit channel is good if Pe(W,) ^ 10~^ and bad 
otherwise. How well do our algorithms perform in determining 
for each of the n bit-channels whether it is good or bad? 

Let us set ]i = 256 and compute upper and lower bounds 
on Pf,(W/) for all i, using the degrading and upgrading quan- 
tizations, respectively. The results of this computation are il- 
lustrated in Figure[T] In 1, 048, 444 out of the 1, 048, 576 cases, 
we can provably classify the bit-channels into good and bad. 
Figure[T] depicts the remaining 132 bit-channels for which the 
upper bound is above the threshold whereas the lower bound 
is below the threshold. The horizontal axis in Figure[T]is the 
bit-channel index while the vertical axis is the gap between the 
two bounds. We see that the gap between the upper and lower 
bounds, and thus the remaining uncertainty as to the true value 
of Pe(W/), is very small in all cases. □ 

Example 2. Now suppose we wish to construct a polar code of a 
given length n having the best possible rate while guaranteeing 
a certain block-error probability Pblock under successive can- 
cellation decoding. Arikan |3 Proposition 2] provide^a union 
bound on the block-error rate of polar codes: 



Phi 



ock 



(1) 



where A is the information set for the code (the set of unfrozen 
bit-channels). The construction problem for polar codes can be 

'in Isj, Ankan uses the Bhattacharyya parameter Z(VV/) instead of the prob- 
ability of eiTor Pp(W;). As we shall see shortly, this is of no real importance. 



phrased (cf fS" Section IX]) as the problem of choosing an in- 
formation set ^ of a given size \A\ = A: so as to minimize the 
right-hand side of ([T]l. Assuming the underlying channel W and 
the code length n are fixed, let 



Pw,„(fc) min EP.(W0 



A\=k 



(2) 



Using our degrading and upgrading algorithms, we can effici- 
ently compute upper and lower bounds on Pw,n{^)- These are 
plotted in Figure|2]for two underlying channels: BSC with cross- 
over probability 0.001 and the binary-input AWGN channel 
with a symbol SNR of 5.00 dB (noise variance = 0.1581). 
In alj^ our calculations, the value of j,i did not exceed 512. 

As can be seen from Figure|2j the bounds effectively coin- 
cide. As an example, consider polar codes of length 2^^ and 
suppose we wish to guarantee -Pw,n(^) ^ 10^^. What is the 
best possible rate of such a code? According to Figure 2(a) we 



can efficiently construct (specify the rows of a generator ma- 
trix) a polar code of rate R = 0.9732. On the other hand, we can 
also prove that there is no choice of an information set ^ in (j2]) 
that would possibly produce a polar code of rate R ^ 0.9737. 



According to Figure 2(b) the corresponding numbers for the 
binary-input AWGN channel are 0.9580 and 0.9587. In prac- 
tice, such minute differences in the code rate are negligible. □ 

-The initial degrading (upgrading) transformation of the binary-input con- 
tinous-output AWGN channel to a binary-input channel with a finite output 
alphabet was done according to the method of Section [Vl] For that calculation, 
we used a finer value of }{ = 2000. Note that the initial degrading (upgrading) 
transformation is peiformed only once. 
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From a theoretical standpoint, one of our main contributions 
is the following theorem. In essence, the theorem asserts that 
capacity-achieving polar codes can be constructed in time that 
is polynomial (in fact, linear) in their length n. 

Theorem 1: Let W be a binary-input, symmetric, discrete 
memoryless channel of capacity I(W). Fix arbitrary real con- 
stants e > and /5 < 1/2. Then there exists an even integer 



(3) 



which does not depend on the code length n, such that the fol- 
lowing holds. For all even integers pi ^ jxq and all sufficiently 
large code lengths n = 2"', there is a construction algorithm 
with running time 0{n ■ fi^log^) that produces a polar code 
for W of rate R ^ J(W) - £ such that Pblock ^ 2~"^ where 
Pblock is the probability of codeword error under successive can- 
cellation decoding. 



We defer the proof of Theorem[T] to Section VIII Here, let 
us briefly discuss two immediate consequences of this theorem. 
First, observe that for a given channel W and any fixed £ and /5, 
the integer piQ in Q is a constant. Setting our fidelity parameter 
in Theorem[T|to j,i = }Iq thus yields a construction algorithm 
with running time that is linear in n. Still, some might argue 
that the complexity of construction in Theorem[T]does depend 
on a fidelity parameter ji, and this is unsatisfactory. The fol- 
lowing corollary eliminates this dependence altogether, at the 
expense of super-linear construction complexity. 

Corollary 2: Let W be a binary-input, symmetric, discrete 
memoryless channel of capacity I(W). Fix arbitrary real con- 
stants £ > and /3 < 1/2. Then there is a construction algo- 
rithm with running time 0{n log^n log log n) that for all suf- 
ficiently large code lengths n, produces a polar code for W of 
rate R ^ I(W) - £ such that Pblock ^ 2-"^ 

Proof: Set }i = 2 Llog2 n\ in Theorem[T](in fact, we could 
have used any function of n that grows without bound). ■ 

We would now like to draw the reader's attention to what The- 
orem[T|iioei not assert. Namely, given W, £ and /5, the theorem 
does not tell us how large n must be, only that some values of n 
are large enough. In fact, given W, £, /5, how large does n need 
to be in order to guarantee the existence of a polar code with 
R ^ I(W) - £ and Pblock ^ 2""'', let alone the complexity of 
its construction? This is one of the central questions in the the- 
ory of polar codes. Certain lower bounds on this value of n are 
given in 1 10|. In the other direction, the exciting recent result of 
Guruswami and Xia 1 11 , Theorem 1] shows that for any fixed 
W and /5 ^ 0.49, this value of n grows as a polynomial in 1 /£. 
The work of 1 1 1 1 further shows that, for any fixed W and /5, the 
parameter Hq in ^ can be also taken as a polynomial in 1 /£. 

The rest of this paper is oragnized as follows. In Section [H] 
we briefly review polar codes and set up the necessary nota- 



tion. Section III is devoted to channel degrading and upgrad- 



ing relations, that will be important for us later on. In Sec- 
tion IV we give a high level description of our algorithms for 



approximating polar bit-channels. The missing details in Sec- 
IV are then fully specified in Section [V] Namely, we show 



tion 



how to reduce the output alphabet of a channel so as to get ei- 



we show how to either degrade or upgrade a channel with con- 
tinuous output into a channel with a finite output alphabet of 



specified size. In Section VII we discuss certain improvements 
to our general algorithms for a specialized case. The accuracy 



of the (improved) algorithms is then analyzed in Section VIII 



II. Polar Codes 

In this section we briefly review polar codes with the primary 
aim of setting up the relevant notation. We also indicate where 
the difficulty of constructing polar codes lies. 

Let W be the underlying memoryless channel through which 
we are to transmit information. If the input alphabet of W is A' 
and its output alphabet is y, we write 'W : X ^ y. The prob- 
ability of observing i/ G 3^ given that x ^ X was transmitted is 
denoted by W(i/|x). We assume throughout that W has binary 
input and so X = {0, 1}. We also assume that W is symmetric. 
As noted in I^SJ, a binary-input channel W is symmetric if and 
only if there exists a permutation tt of 3^ such that n^^ = n 
(that is, TT is an involution) and W(i/|l) = W(7r(i/)|0) for 
all y G 3^ (see \^ p. 94] for an equivalent definition). When the 
permutation is understood from the context, we abbreviate n{y) 
as y, and say that y and y are conjugates. For now, we will fur- 
ther assume that the output alphabet 3^ of W is finite. This 



assumption will be justified in Section VI where we show how 
to deal with channels that have continuous output. 

Denote the length of the codewords we will be transmitting 
over W by n = 2™. Given y = (yo,yi, ■ ■ ■ /yn-i) G y" and 
U = (mQ/ "1/ ■ ■ -/Wn-l) G X", let 



W"(i/|m; 



def 



nw(y,i«o- 

(=0 



Thus W" corresponds to n independent uses of the channel W. 
A key paradigm introduced in |3 1 is that of transforming n iden- 
tical copies (independent uses) of the channel W into n polar 
bit-channels, through a successive application of Arikan chan- 
nel transforms, introduced shortly. For i = 0,1, . . . ,n — 1, the 
f-th bit-channel W/ has a binary input alphabet X, an output al- 
phabet y" X X', and transition probabilities defined as follows. 
Let G be the polarization kernel matrix of I^SJ, given by 

1 

Let G^™ be the ?n-fold Kronecker product of G and let B„ be 
the n X n bit-reversal premutation matrix defined in ||3j Sec- 
tion VII-B]. Denote m,_i = [uq, Ui, . . . , M/_i). Then 



>V,(i/,m,_i|m/) 
2' 



def 



ther a degraded or an upgraded version thereof. In Section VI 



^ X: W"(i/|(m,_i,«„z;)B„G«"') . (4) 

ve{0,l}"-'^-' 

Given the bit-channel output y and m,_i, the optimal (maxim- 
um-likelihood) decision rule for estimating m, is 

Ui = argmax{W/(i/,M,_i|0), W,(i/,M/_i|l)} 

with ties broken arbitrarily. This is the decision rule used in 
successive cancellation decoding [3 1. As before, we let Pe(W/) 
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denote the probability that m, 7^ m, under this rule, assuming 
that the a priori distribution of m, is Bernoulli (1/2). 

In essence, constructing a polar code of dimension k is equiv- 
alent to finding the k "best" bit-channels. In |3), one is instructed 
to choose the k bit-channels with the lowest Bhattacharyya 
bound Z(W,) on the probability of decision error Pe(W, ). We 
note that the choice of ranking according to these Bhattacharyya 
bounds stems from the relative technical ease of manipulating 
them. A more straightforward criterion would have been to rank 
directly according to the probability of error ( W, ), and this is 
the criterion we will follow here. 

Since W, is well defined through Q, this task is indeed ex- 
plicit, and thus so is the construction of a polar code. However, 
note that the output alphabet size of each bit-channel is expo- 
nential in n. Thus a straightforward evaluation of the ranking 
criterion is intractable for all but the shortest of codes. Our main 
objective will be to circumvent this difficulty. 

As a first step towards achieving our goal, we recall that the 
bit-channels can be constructed recursively using the Arikan 
channel transformations W E W and W ® W, defined as fol- 
lows. Let W: X ^ yhea binary-input, memoryless, symmet- 
ric (BMS) channel. Then the output alphabet of W s W is y^, 
the output alphabet of W ® W is x X, and their transition 
probabilities are given by 



def 



{WmW){yi,y2\u,) 



^ E W(yil"iffi"2)W(y2|w2) (5) 



and 



{W®W) (yi,y2,wi|w2) 



def 



,>V(yi|wi®M2)W(y2|w2) (6) 



One consequence of this recursive construction is that the ex- 
plosion in the output alphabet size happens gradually: each tran- 
sform application roughly squares the alphabet size. We will 
take advantage of this fact in Section [TVl 

III. Channel Degradation and Upgradation 

As previously outlined, our solution to the explosion in growth 
of the output alphabet of W; is to replace the channel W, by an 
approximation. In fact, we will have two approximations, one 
yielding a "better" channel and the other yielding a "worse" 
one. In this section, we formalize these notions. 

We say that a channel Q : X ^ Z is (stochastically) de- 
graded with respect to W : X ^ y, if there exists a channel 
V -.y ^ Z such that for ullz E Z and x E X, 



Q{z\x) 



V{z\y) . 



(7) 



For a graphical depiction, see Figure 3(a) We write Q ^ W to 
denote that Q is degraded with respect to W. 

In the interest of brevity and clarity later on, we also define 
the inverse relation: we say that a channel Q' : X ^ Z' is 







original 
channel 
W 




another 
channel 

P 












^ 



degraded channel Q 
(a) Degrading 







upgraded 
channel 

Q' 




another 
channel 

P 





















original channel W 
(b) Upgrading 

Fig. 3. Degrading and upgrading a channel W 



upgraded with respect toW : X ^ y if there exists a channel 
V : Z' ^y such that for all z' E Z' and x E X, 



Wiy\x) 



E Q'{z'\x)-V{y\z') (8) 

z'ez' 



(see Figure 3(b) 1. Namely, Q' can be degraded to W. Similarly, 
we write this as Q' )>= W. 
By definition, 

W 4W' if and only if W' ^ W . (9) 

Also, it is easily shown that "degraded" is a transitive relation: 

If W 4W' and W' 4 W" then W 4W' . (10) 

Thus, the "upgraded" relation is transitive as well. Lastly, since 
a channel is both degraded and upgraded with respect to itself 
(take the intermediate channel as the identity function), we have 
that both relations are reflexive: 



W 4W and W 



(11) 



If a channel W' is both degraded and upgraded with respect 
to W, then we say that W and W' are equivalent, and denote 
this by W = W'. Since "degraded" and "upgraded" are tran- 
sitive relations, it follows that the "equivalent" relation is tran- 
sitive as well. Also, by we have that "equivalent" is a sym- 
metric relation: 



W=W if and only if W' = W . 



(12) 



Lastly, since a channel W is both upgraded and degraded with 
respect to itself, we have by ( [TT] i that "equivalent" is a reflex- 
ive relation. Thus, channel equivalence is indeed an equivalence 
relation. 

Let W : A" ^> 3^ be a given BMS channel. We now set the 
notation for three quantities of interest, i) Denote by Pe.(W) 
the probability of error under maximum-likelihood decision, 
where ties are broken arbitrarily, and the input distribution is 
Bernoulli(l/2). That is. 



Pe{W) 



Emin{W(y|0),W(y|l)}. 



(13) 
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ii) Denote by Z(yV) the Bhattacharyya parameter, 



= E V>V(y|0)>V(y|l) . 



(14) 



W{y\x) 



lW{y\0) 



2W(y|i) 



iii) Denote by /(W) the capacity, 

The following lemma states that these three quantities behave 
as expected with respect to the degrading and upgrading rela- 
tions. The equation most important to us will be ( fTS) . 

Lemt7ia 3 ( page 207]): Let W : ^ 3^ be a BMS 
channel and let Q : A' — > be degraded with respect to W, 
that is, Q^W. Then, 



liQ) ^ liW) . 



and 



(15) 
(16) 
(17) 



Moreover, all of the above continues to hold if we replace 
"degraded" by "upgraded", =^ by )>=, and reverse the inequali- 
ties. Specifically, if >V = Q, then the weak inequalities are in 
fact equalities. 

Proof: We consider only the first part, since the "More- 
over" part follows easily from Equation (j9|l. For a simple proof 
of ( 15 I, recall the definition of degradation (|7]l, and note that 



1 



J'.(S) = ^ Ei"WS(2|o),s(2|i)} 



zez 



E W(y|0)-P(z|y),E W(y|l) 



- E min 

- zeZ 

>\Y. E"^WW(y|0),W(y|l)}-P(z|y) 



z&Zyey 

Equation ( [T6| is concisely proved in fTT Lemma 1.8]. Equation 
( TtJ is a simple consequence of the data-processing inequality 
jsjTheorem 2.8.1]. ■ 

Note that it may be the case that y is its own conjugate. That 
is, y and y are the same symbol (an erasure). It would make 
our proofs simpler if this special case was assumed not to hap- 
pen. We will indeed assume this later on, with the next lemma 
providing most of the justification. 

Lemma 4: Let W : ^> be a BMS channel. There exists 
a BMS channel W' : X ^ Z such that i) W' is equivalent to 
W, and ii) for all z G 2 we have that z and z are distinct. 

Proof: If W is such that for all y G 3^ we have that y and 
y are distinct, then we are done, since we can take W' equal to 
W. 

Otherwise, let y? G 3^ be such that y? and y? are the same 
symbol. Let the alphabet Z be defined as follows: 

2 = (3^\{yJ)u{zi,z2}, 

where Zj and Z2 are new symbols, not already in y. Now, define 
the channel W : X ^ Z as follows. For all z G Z and x G 
X, 



W{z\x) = 



W{z\x) 
lW(y,|x) 



ifz G y, 

if z = Zi or z 



We first show that W' )p= W. To see this, take the inter- 
mediate channel V : Z ^ y as the channel that maps (with 
probability 1) Zj and Z2 to y,, and all other symbols to them- 
selves. Next, we show that W' ^ W. To see this, define the 
intermediate channel V : y ^ Z as follows. 




= y? and z G {21,22}, 
otherwise. 



To sum up, we have constructed a new channel W' which is 
equivalent to W, and contains one less self-conjugate symbol 
(y? was replaced by the pair Zi,Z2). It is also easy to see that 
W" is BMS. We can now apply this construction over and over, 
until the resulting channel has no self-conjugate symbols. ■ 

Now that Lemma [4] is proven, we will indeed assume from 
this point forward that all channels are BMS and have no output 
symbols y such that y and y are equal. As we will show later 
on, this assumption does not limit us. Moreover, given a generic 
BMS channel W : X ^ y , we will further assume that for all 
y G 3^, at least one of the probabilities W(y|0) and >V(y|0) 
is positive (otherwise, we can remove the pair of symbols y, y 
from the alphabet, since they can never occur). 

Given a channel W : A" — )• 3^, we now define for each output 
symbol y G 3^ an associated likelihood ratio, denoted LRyy (y). 
Specifically, 



>V(y|0) ^ >V(y|0) 
W(y|l) W(y|0) 



Z2. 



(if>V(y|0) = 0, then we must have by assumption that W(y |0) > 
0, and we define LRyy(y) = 00). If the channel W is under- 
stood from the context, we will abbreviate LRyv(y) to LR(y). 

IV. High-Level Description of the Algorithms 

In this section, we give a high level description of our algo- 
rithms for approximating a bit channel. We then show how these 
approximations can be used in order to construct a polar code. 

In order to completely specify the approximating algorithms, 
one has to supply two merging functions, a degrading merging 
function degrading_merge and an upgraded merging func- 
tion upgrading_merge. We will now define the properties 
required of our merging functions, leaving the specification of 
the functions we have actually used to the next section. The next 
section will also make clear why we have chosen to call these 
functions "merging". 

For a degrading merge function degrading_merge, the 
following must hold. For a BMS channel W and positive in- 
teger ji, the output of degrading_merge(W, /i) is a BMS 
channel Q such that i) Q ^ W is degraded with respect to W, 
and ii) The size of the output alphabet of Q is at most pi. We de- 
fine the properties required of upgrading_merge similarly, 
but with "degraded" replaced by "upgraded" and ^ by )>=. 

Let ^ z < n be an integer with binary representation i = 
{bi,b2, ■ ■ ■ ,b„i)2, where bi is the most significant bit. Algo- 
rithms |A] and |B] contain our procedures for finding a degraded 
and upgraded approximation of the bit channel W;^'"\ respec- 
tively. In words, we employ the recursive constructions (j5]l and 
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(|6]l, taking care to reduce the output alphabet size of each in- 
termediate channel from at most 2j-i'^ (apart possibly from the 
underlying channel W) to at most 

Algorithm A: Bit-channel degrading procedure 

input : An underlying BMS channel W, a bound fi = 2v on the 
output alphabet size, a code length n = 2"', an index i 
with binary representation ! = (bj, ^2/ • • ■ / i'm)2- 
output : A BMS channel that is degraded with respect to the bit 

channel W/. 
Q <— degrading_merge(W, /i) 
for i = 1,2, . . . ,m do 
if fe , = then 

else 

Q <— degrading_merge(VV,^) 
return Q 



Algorithm B: Bit-channel upgrading procedure 

input : An underlying BMS channel W, a bound pi = 2v on the 
output alphabet size, a code length n = 2"', an index / 
with binary representation / = (bj, ^2/ • • ■ / i'm)2- 
output : A BMS channel that is upgraded with respect to the bit 

channel W, . 
Q' <— upgrading_inerge(W, ^) 
for j = 1,2, . . . ,m do 
if bj = then 

else 

[_W^ Q'® Q' 

Q' <— upgrading_merge (W,fi) 
return Q' 



The key to proving the correctness of Algorithms |A] an d[B] 
is the following lemma. It is essentially a restatement of | |13| 
Lemma 4.7]. For completeness, we restate the proof as well. 

Lemma 5: Fix a binary input channel W : X ^ y, and 
denote 

Wa^WfflW, W®=W®W . 
Next, let Q 5^ W be a degraded with respect to W, and denote 

Then, 

=^ Wm and Q® =^ W® . 

Namely, the degradation relation is preserved by the channel 
transformation operation. 

Moreover, all of the above continues to hold if we replace 
"degraded" by "upgraded" and =^ by )>=. 

Proof: We will prove only the "degraded" part, since it im- 
plies the "upgraded" part (by interchanging the roles of W and 

Q). 

Let V '.y ^ Zht the channel which degrades W to Q: for 
allz & Z and x & X, 



We first prove =^ Wg]. By (jsj) applied to Q, we get that 
for all (zi,Z2) E and mj E X, 

\ 

Q[*]((zi,Z2)|mi) = a2(2:i|«l ffi "2)Q(Z2|W2) ■ 

U2ex ^ 



Next, we expand Q twice according to (18 1, and get 



Sffl((Zl,Z2)|Ml) = 

E EJ^(yil"i®"2)w(y2l«2)^(ziiyi)p(z2iy2) . 

By (jSjl, this reduces to 

S[*i((zi,Z2)|mi) = 

E Wffl((yi'y2)l"i)^(zi|yi)^(z2|y2). (i9) 

(yi-y2)6y2 

Next, define the channel V* : 3^^ — )• Z'^ as follows. For all 

(yi/y2) E y^ and (zi,Z2) E z^, 

P*((zi,Z2)|(yi,y2)) = P(zi|yi)P(z2|y2) ■ 

It is easy to prove that V* is indeed a channel (we get a proba- 
bility distribution on Z^ for every fixed (yi,y2) E 3^^). Thus, 
([19]) reduces to 

S[i]((zi,Z2)|Ml) = 

E Wa((yi.y2)l"i)^*((zi,z2)|(yi,y2)), 

and we get by ^ that =^ Wg. The claim Q® =^ W® is 
proved in much the same way. ■ 
Proposition 6: The output of Algorithm |A] (Algorithm |B]) 
is a BMS channel that is degraded (upgraded) with respect to 

Proof: The proof follows easily from Lemma |5] by induc- 
tion on y. ■ 

Recall that, ideally, a polar code is constructed as follows. 
We are given an underlying channel W : X ^ y, a spec- 
ified codeword length n = 2'", and a target block error rate 
^Block- We choose the largest possible subset of bit-channels W,- 
such that the sum of their probabilities of error Pe{Wj) is not 
greater than SBlock- The resulting code is spanned by the rows 
in BnG^'" corresponding to the subset of chosen bit-channels. 
Denote the rate of this code as inexact- 

Since we have no computational handle on the bit channels 
W,, we must resort to approximations. Let Q, be the result of 
running Algorithm [A] on W and Since Q, ^ W/, we have 



by (15 I that Pe{Qi) ^ f'e(VV,). Note that since the output al- 



Q(z|x) = E W(y|x)P(z|y) . 
yey 



(18) 



phabet of Q,- is small (at most pi), we can actually compute 
Pe{Qi)- We now mimic the ideal construction by choosing the 
largest possible subset of indices for which the sum of Pe{Qi) 
is at most eeiock- Note that for this subset we have that the sum 
of Pg{Wi) is at most eeiock as well. Thus, the code spanned by 
the corresponding rows of B„G®'" is assured to have block error 
probability of at most eBIock- 

Denote the rate of this code by i^degraded- It is easy to see that 
^degraded ^ ^^exact- In order to gauge the difference between 
the two rates, we compute a third rate, i^upgraded' such that 
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graded ^ ^exact Mid consider the difference i^upgraded ~ 

^degraded- The rate i^upgraded is computed the same way that 
^degraded is> but instead of using Algorithm lAl we use Algo- 
rithmjBj Recall from Figure|2]that ^degraded ^upgraded 
typically very close. 

We end this section by noting a point that will be needed 



Proof: Take the intermediate channel V : y ^ Z d& 
the channel that maps with probability 1 as follows (see Fig- 
ure 4(b) i: both yi and 1/2 ^n^P to Zj 2, both yi and y2 map to 



in the proof of Theorem 14 below. Consider the running time 



Zj 2, other symbols map to themselves. Recall that we have as- 
sumed that W does not contain an erasure symbol, and this 
continues to hold for Q. ■ 
We now define the degrading_merge function we have 
used. It gives good results in practice and is amenable to a fast 



needed in order to approximate all n bit channels. Assume that 
each invocation ofeitherdegrading_merge or upgrading_riaipl^entation. Assume we are given a BMS channel W 



takes time t = t(^). Thus, the time needed for approximating 
a single bit channel using either Algorithm [A] or Algorithm[B]is 
0{mT). A naive analysis suggests that the time needed in or- 
der to approximate all n bit-channels is 0{n ■ niT). However, 
significant savings can be gained by noticing that intermediate 
calculations can be shared between bit-channels. For example, 
in a naive implementation we would approximate W ffl W over 
and over again, n/2 times instead of only once. A quick calcu- 
lation shows that the number of distinct channels one needs to 
approximate is 2n — 1 — 1. That is, following both branches of 
the "if" statement of (without loss of generality) Algorithm [A| 
would produce 2^ channels for each level 1 ^ j ^ m. Thus, the 
total running time can be reduced to 0{{2n — 2) ■ t), which is 
simply 0(n ■ t). 

V. Merging Functions 



X y with an alphabet size of 2L (recall our assumption of 
no self-conjugates), and wish to transform W into a degraded 
version of itself while reducing its alphabet size to /i. If 2L ^ ^, 
then we are done, since we can take the degraded version of W 
to be W itself. Otherwise, we do the following. Recall that for 
each y we have that LR(i/) = 1 /LR(y), where in this context 
1 /O = 00 and 1 /oo = 0. Thus, our first step is to choose from 
each pair (y, y) a representative such that LR(y) ^ 1. Next, we 
order these L representative such that 



1 ^ LR(yi) ^ LR(y2) ^ ■ • • ^ LR(yi 



(20) 



In this section, we specify the degrading and upgrading func- 
tions used to reduce the output alphabet size. These functions 
are referred to as degrading_merge and upgrading_merge 
in Algorithms [A] and [B] respectively. For now, let us treat our 
functions as heuristic (delaying their analysis to Section [VIII| l. 



We now ask the following: for which index 1 ^ f ^ L — 1 
does the channel resulting from the application of Lemma |7] to 
W, y,-, and y;_|_i result in a channel with largest capacity? Note 
that instead of considering (2) merges, we consider only L — 1. 
After finding the maximizing index i we indeed apply Lemma]?] 
and get a degraded channel Q with an alphabet size smaller by 
2 than that of W. The same process is applied to Q, until the 
output alphabet size is not more than ji. 

In light of Lemma]7]and ( 20 1, a simple yet important point to 
note is that if y, and y,+i are merged to z, then 



LR(y,-) ^LR(z) ^LR(y,-+i) 



(21) 



A. Degrading-merge function 

We first note that the problem of degrading a binary-input 
channel to a channel with a prescribed output alphabet size was 
independently considered by Kurkoski and Yagi 1 15 1. The main 
result in |15J is an optimal degrading strategy, in the sense that 
the capacity of the resulting channel is the largest possible. In 
this respect, the method we now introduce is sub-optimal. How- 
ever, as we will show, the complexity of our method is superior 
to that presented in p3) . 

The next lemma shows how one can reduce the output alpha- 
bet size by 2, and get a degraded channel. It is our first step 
towards defining a valid degrading_merge function. 

Lemma 7: Let W : X ^ y h& 2l BMS channel, and let yi 
and y2 be symbols in the output alphabet y. Define the channel 
Q : — > 2 as follows (see Figure 4(a) 1. The output alphabet 
Z is given by 

2 = y\{yi,yi,y2,y2} u {zi^2/Zi,2} ■ 

For all X e and z & Z, define 



Namely, the original LR ordering is essentially preserved by the 
merging operation. Algorithm]C]contains an implementation of 
our merging procedure. It relies on the above observation in or- 
der to improve complexity and runs in 0(L ■ log L) time. Thus, 
assuming L is at most 2f/^, the running time of our algorithm 
is 0{it^ log pi). In contrast, had we used the degrading method 
presented in 1 15 1, the running time would have been 0(/i^). 

Our implementation assumes an underlying data structure 
and data elements as follows. Our data structure stores data 
elements, where each data element corresponds to a pair of ad- 



Q(z|x) 



r W(z|x) 
^ W(yi|x) 
[yV{yi\x) 



W(y2|x) 
W(y2|x) 



ifz ^ {Zl,2, Zu), 
if Z = Zi^2' 
if Z = Zi^2- 



jacent letters y, and y/+i, in the sense of the ordering in (20 1. 
Each data element has the following fields: 

a , h , a' , b' , deltal , dLeft , dRight , h . 

The fields a, b, a', and b' store the probabiUties W(y/|0), W(y/ 10), 
>V(y;+l |0), and >V(y,+i |0), respectively. The field deltal con- 
tains the difference in capacity that would result from applying 
Lemma]7]to y; and yt+i- Note that deltal is only a function of 
the above four probabilities, and thus the function calcDeltal 
used to initialize this field is given by 



calcDeltal{a,b,a',b') 
where 



C{a,b) + C{a',b')-C{a+,b+) , 



Then Q ^ >V. That is, Q is degraded with respect to W. 



C{a,b) = -{a + b)\og2{{a + b) /2) + alog2{a) + i>log2{i>) , 



g 
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Algorithm C: The degrading_merge function 

input : A BMS channel W : X ^ y where |3^| = 2L, a bound 

fi = 2v on the output alphabet size, 
output : A degraded channel Q : X ^ y' , where |3^'| ^ 
II Assume 1 ^ LR(i/i) ^ LR(i/2) ^ ■ ■ ■ LR(3/l) 
for f = 1, 2, . . . , L - 1 do 

d <— new data element 

d.fl ^ >V(y,|0) , d.b^>V(y,|0) 

d.fl'^ W(y,+i|0), d.b'^ W(y,+i|0) 

d.deltal <- calcDeltal(d.fl,d.fe,d.fl',d.b') 

insertRightmost (d) 

l = L 

wliile l> V Ao 

d -S- getMin() 

fl+ = d.fl + d.fl', fe+ = d.fc + d.fc' 
dLeft ^ d.left 
dRight d.right 

removeMin() 

if dLeft ^ null then 
dLeft.fl' = fl+ 
dLeft.b' = fe+ 

dLeft.deltal calcDeltai(dLeft.fl,dLeft.fe,fl+,fe+) 
valueUpdated(dLeft) 

if dRight ^ null then 
dRight.fl = fl+ 
dRight.b = fe+ 
dRight.deltal ^ 

calcDeltai(fl+,b+,dRight.fl',dRight.fc') 
valueUpdated(dRight) 

Construct Q, according to the probabilities in the data structure 
and return it. 



we use the shorthand 

fl+ = fl + , 1,+ =b + b' , 

and log2 is defined as 0. The field dLeft is a pointer to the 
data element coiTesponding to the pair and y, (or "null", 
if ; = 1). Likewise, dRight is a pointer to the element coiTe- 
sponding to the pair y,^x ™d y,^2 (see Figure|5]for a graphical 
depiction). Apart from these, each data element contains an in- 
teger field h, which will be discussed shortly. 



Before the merge of y, and y,+i: 

dLeft dRight 

■ • ■ ^ ivi-i^yi) ^ (y/^ym) ^ (y/+i'yi+2) ^ ■ ■ ■ 

merged to z 
After the merge, a new symbol z: 

... ^ (y,-_i,z) ^ (z,y/+2) ^ ... 



Fig. 5. Graphical depiction of the doubly-linked-Iist before and after a merge. 

We now discuss the functions that are the interface to our data 
structure: insertRightmost, getMin, removeMin, and 
valueUpdated. Our data structure combines the attributes 
of a doubly-linked-list f?' Section 10.2] and a heaf[^|7, Chapter 
6]. The doubly-linked list is implemented through the dLeft and 
dRight fields of each data element, as well as a pointer to the 
rightmost element of the list. Our heap will have the "array" im- 
plementation, as described in |7, Section 6.1]. Thus, each data 
element will have a corresponding index in the heap array, and 
this index is stored in the field h. The doubly-linked-list will 
be ordered according to the corresponding LR value, while the 
heap will be sotted according to the deltal field. 

The function insertRightmost inserts a data element 
as the rightmost element of the list and updates the heap ac- 
cordingly. The function getMin returns the data element with 
smallest deltal. Namely, the data element corresponding to the 
pair of symbols we are about to merge. The function removeMin 
removes the element returned by getMin from both the linked- 
list and the heap. The function valueUpdated updates the 
heap due to a change in deltal resulting from a merge, but does 
not change the linked list in view of pT| ). 

The running time of getMin is 0(1), and this is obviously 
also the case for calcDeltal. Due to the need of updating 
the heap, the running time of removeMin, valueUpdated, 

^In short, a heap is a data structure that supports four operations: "insert", 
"getMin", "removeMin", and "valueUpdated". In our implementation, the run- 
ning time of "getMin" is constant, while the running time of the other operations 
is logarithmic in the heap size. 
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and insertRightmost is O(logL). The time needed for 
the initial sort of the LR pairs is 0(L ■ logL). Hence, since 
the initiahzing for-loop in Algorithm|C]has L iterations and the 
while-loop has L — v iterations, the total running time of Algo- 
rithm|c]isO(L-logL). 

Note that at first sight, it may seem as though there might 
be an even better heuristic to employ. As before, assume that 
the i/i are ordered according to their likelihood ratios, and all 
of these are at least 1. Instead of limiting the application of 
Lemma [7] to y,- and we can broaden our search and con- 
sider the penalty in capacity incurred by merging arbitrary j// 
and i/j, where i 7^ j. Indeed, we could further consider merg- 
ing arbitrary y, and yy, where i 7^ j. Clearly, this broader search 
will result in worse complexity. However, as the next theorem 
shows, we will essentially gain nothing by it. 

Theorem 8: Let W : ^> 3^ be a BMS channel, with 

y = {yi,y2,---,yL,yvy2,---,yL} ■ 

Assume that 

1 ^ LR(yi) ^ LR(y2) ^ ■ ■ ■ ^ LR(yi.) . 

For symbols Wi,W2 E y, denote by I{iUi,W2) the capacity of 
the channel one gets by the application of Lemma|7]to lUi and 
W2- Then, for all distinct 1 ^ / ^ L and 1 ^ ^ L, 

HyuVj) = KyuVj) ^ ^iyuyj) = Kyuyj) ■ (22) 

Moreover, for all 1 ^ f < / < A: ^ L we have that either 

Kyi'yj) ^ Kyuyu) , 

or 

We note that Theorem |8] seems very much related to 1 15 
Lemma 5]. However, one important difference is that Theo- 
rem [8] deals with the case in which the degraded channel is 
constrained to be symmetric, while |15, Lemma 5] does not. 
At any rate, for completeness, we will prove Theorem[8]in Ap- 
pendix [A] 

B. Upgrading-merge functions 

The fact that one can merge symbol pairs and get a degraded 
version of the original channel should come as no surprise. How- 
ever, it turns out that we can also merge symbol pairs and get 
an upgraded version of the original channel. We first show a 
simple method of doing this. Later on, we will show a slightly 
more complex method, and compare between the two. 

As in the degrading case, we show how to reduce the out- 
put alphabet size by 2, and then apply this method repeatedly 
as much as needed. The following lemma shows how the core 
reduction can be carried out. The intuition behind it is simple. 
Namely, now we "promote" a pair of output symbols to have a 
higher LR value, and then merge with an existing pair having 
that LR. 

Lemma 9: Let W : A' 3^ be a BMS channel, and let 
1/2 and yi be symbols in the output alphabet y. Denote A2 = 
LR(y2) and Aj = LR(yi). Assume that 



Next, let fli = yV(yi|0) and bi = W(yi|0). Define 0C2 and ^2 
as follows. If A2 < 00 



a2 = A2 



fli + h 



A2 + 1 A2 + : 

Otherwise, we have A2 = 00, and so define 

0^2= ai+bi 162 = . 



(24) 



(25) 



We note that the subscript "2" in 0.2 and (62 is meant to suggest 
a connection to A2, since ^2/(^2 = ^2- 
For real numbers a, /5, and x E X, define 




Define the channel Q' : X ^ Z' as follows (see Figure 6(a) 1. 
The output alphabet Z' is given by 

2' = y\ {y2,y2,y\,y\} u {22,22} . 

For all X e A' and 2 E Z' , 

{W{Z\X) if2^ {22,22}, 

S'(2|X) = <^ W{y2\x) + t{,^2,h\^) if2 = 22, 
[W{y2\x) + t{^2,^2\x) if2 = 22. 

Then Q' )>= W. That is, Q' is upgraded with respect to W. 

Proof: Denote ^2 = W(y2|0) and h2 = W(y2|0). First, 
note that 

ai + hi = 0i2 + ^2 ■ 
Next, let 7 be defined as follows. If A2 > 1, let 



7 



0L2 - ^2 h-^2' 



and note that (23 1 implies that ^ 7 ^ L Otherwise (Ai 
A2 = 1), let 

7 = 1. 



'P{y\z) 



1 ^ Ai ^ A2 



(23) 



Define the intermediate channel V : Z' ^ y follows. 

1 if 2 ^ {22, 22} and y = 2, 

^ if(2,y)e{(22,yi),(22,yi)}, 

^ if (2,y) e {(22,y2),(22,y2)}, 

if (2,y) E {(22,yi), (22, yi)}, 

^ otherwise. 
Notice that when A2 < 00, we have that 

fl2 h2 OC2 

= — and 

fl2 + 0^2 O2 + P2 ^2 + ^2 

Some simple calculations finish the proof. ■ 
The following corollary shows that we do not "lose anything" 
when applying Lemma[9]to symbols y\ and y2 such that LR(yi ) 
LR(y2). Thus, intuitively, we do not expect to lose much when 
applying Lemma|9]to symbols with "close" LR values. 

Corollary 10: Let W, Q' , yi, and y2 be as in Lemma ^ 
If LR(yi) = LR(y2), then Q' = W. That is, W and Q 
are equivalent. Moreover, all of the above holds if we replace 
"Lemma|9]' by "Lemma|7]'. 
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Proof: The proof follows by noticing that the channel Q' 
we get by applying Lemma [9] to W, i/i, and y2, is exactly the 
same channel we get if we apply Lemma |7] instead. Thus, we 
have both Q' )?= W and Q' ^W. ■ 

In Lemma |9] we have essentially transferred the probabil- 
ity W(i/i|0) + W(yi|0) onto a symbol pair with a higher LR 
value. We now show a different method of merging that in- 
volves dividing the probability yV(i/i|0) + W(yi|0) between 
a symbol pair with higher LR value and a symbol pair with 
lower LR value. As we will prove latter on, this new approach 
is generally preferable. 

Lemma 11: Let W : A" ^> 3^ be a BMS channel, and let i/i, 
1/2, and 1/3 be symbols in the output alphabet y. Denote Ai = 
LR(i/i), A2 = LR(i/2), and A3 = LR(i/3). Assume that 

1 ^ Ai < A2 < A3 . 

Next, let fl2 = VV(i/2|0) and &2 = >V(y2|0). Define ai, j6i, a3, 
/33 as follows. If A3 < 00 



Oil 

^3 



Ai 
A3 



A3b2 - ai 
A3 - Ai 

fl2 — Ait'2 



/5i = 

h = 



A3 - Ai 

fl2 — Ait'2 



A3 - Ai A3 - Ai 

Otherwise, we have A3 = 00, and so define 



as 



fl2 — Aib2 



/5l 
f^3 



. 



(26) 
(27) 



(28) 
(29) 



Let f(a, /5|x) be as in Lemma [9j an d defi ne the BMS channel 
Q' : X ^ Z' as follows (see Figure 7(a) 1. The output alphabet 
Z' is given by 

2' = y\{yi'yi'y2,y2,y3,y3} u {21,21,23,23} . 

For allx & X and 2 e Z', define 



Q'{z\x) 



W{z\x) 

W{yt\x) 

W{yt\x) 

[myslx) 



if 2 ^ {21,21,23,23}, 



f(a3/j 



if 2 


= Zl, 


if 2 


= Zl, 


if 2 


= Z3, 


if 2 


= Z3- 



Then Q' ;>= W. That is, Q' is upgraded with respect to W. 



Proof: Denote fli = W(i/i|0), bi = W(yi|0), = 
VV(i/3|0), and = W(y3|0). Define the intermediate channel 
V : Z' ^yas follows. 



«1 



fl3+tt3 





i7l+/3l 

h+lil 



if 2 ^ {23,23,21,21} andy = 2, 
if (2,y) e {(2i,yi),(2i,yi)}, 
if (2,y) e {(2i,y2),(2i,y2)}, 
if (2,y) e {(23,y3),(23,y3)}, 
if (2,y) e {(23,y2),(23,y2)}, 
otherwise. 



Notice that when A3 < 00, we have that 

fl3 ^3 „„j ^3 



and 



/53 



fl3 + a3 ^'3 + 163 a^ + a^ bo, + ^3 

The proof follows by observing that, whatever the value of A3, 

ai + a3 = fl2 and f^i + (^3 = bi ■ 



The following lemma formalizes why Lemma 1 1 results in a 
merging operation that is better than that of Lemma|9] 

Lemma 12: Let W, yi, y2, and y3 be as in Lemma [TT| D e- 
note by S'123 : ^'123 result of applying Lemmallllto 

W, yi, y2, and y3. Next, denote by Q23 ■ ~^ ^'23 ^he result 
of applying Lemma |9] to YV, y2, and y3. Then Q23 ^ Q'\23 ^ 
W. Namely, in a sense, Q123 ^ more faithful representation 
of W than is. 

Proof: Recall that the two alphabets ^^23 and -E23 satisfy 

-2^123 = {^1/ 21,23,23} U A , 
-2^23 = {yi'yi'Z3,23} U^, 



where 



-4 = {yi,yi,y2,y2,y3,y3} 



is the set of symbols not participating in either merge operation. 

In order to prove that Q'123 is degraded with respect to Q23' 
we must supply a corresponding intermediate channel V : 2^23 ~ 
Z'^iy To this end, let 



A3 



and 



g;23fe|0) _ 6^3(^310) ^ W(y3|0) 
2123(2311) 2^3^11) W(y3|l) 

S'123(Z3|0) _ S'i23(Z3|l) 



7 



Q^3(23|0) 2^3(2-3 11) 



IEEE Transactions on Information Theory: submitted for publication 



11 




(a) (b) 



Fig. 7. Second method of Upgrading W to Q! . (a) The upgrading merge operation, (b) The intermediate channel V. 



Note that in both Lemma [TT| and [9| we have that cij,/ fij, = A3. 
Next, we recall that in Lemma|9]we have that 0:3 + = ^2 + 
h2 whereas in Lemma 1 1 we have 0:3 + /33 = fl2 + ^2 ~ '^l ~ 



f>l. Thus, we conclude that ^ 7 ^ L Moreover, since ^ 
7 ^ 1, we conclude that the following definition of an interme- 
diate channel is indeed valid. 



'?'(Z123|Z23) 
1 
1 



7 

(l-7)Ai 

Ai+1 
(1-7) 
Ai+1 





if 2:123 = Z23 and Z123 e A, 

if (Z23,Z123) e {(l/i,Zi),(yi,Zi)}, 
if (Z23,Z123) e {(Z3,Z3),(Z3,Z3)}, 
if (223/2123) e {(z3,Zi),(z3,Zi)}, 
if (223/2123) e {(z3,Zi),(z3,Zi)}, 

Otherwise. 



A short calculation shows that V is indeed an intermediate chan- 
nel that degrades Q23 to 2^23 ■ * 
At this point, the reader may be wondering why we have cho- 
sen to state Lemma|9]at all. Namely, it is clear what disadvan- 
tages it has with respect to Lemma [TT| but we have yet to in- 
dicate any advantages. Recalling the conditions of Lemma [TT] 
we see that it can not be employed when the set {Ai,A2, A3} 
contains non-unique elements. In fact, more is true. Ultimately, 
when one wants to implement the algorithms outlined in this 
paper, one will most probably use floating point numbers. Re- 
call that a major source of numerical instability stems from sub- 
tracting two floating point numbers that are too close. By con- 



sidering the denominator in ( 26 1 and ( 27 1 we see that A^ and A3 
should not be too close. Moreover, by considering the numera- 
tors, we conclude that A2 should not be too close to both Ai and 
A3. So, when these cases do occur, our only option is Lemma|9] 
We now define the merge-upgrading procedure we have used. 
Apart from an initial step, it is very similar to the merge-degradinj 
procedure we have previously outlined. Assume we are given 
a BMS channel W : X y with an alphabet size of 2L and 
wish to reduce its alphabet size to while transforming W 
into a upgraded version of itself. If 2L ^ }i, then, as before, 
we are done. Otherwise, as in the merge-degrading procedure, 
we choose L representatives 1/1, 1/2/ ■ ■ ■ / i/L. and order them ac- 
cording to their LR values, all of which are greater than or equal 
to 1. We now specify the preliminary step: for some specified 
parameter epsilon (we have used e = 10~^), we check if there 
exists an index 1 ^ f < L such that the ratio LR(i/,+i)/LR(i//) 



is less than 1 + e. If so, we apply Lemma|9]repeatedly, until no 
such index exists. Now comes the main step. We ask the follow- 
ing question: for which index 1 ^ ; ^ L — 1 does the channel 



resulting from the application of Lemma 1 1 to W, y,-, y/+i, and 



y;+2 result in a channel with smallest capacity increase? After 



finding the minimizing index i, we indeed apply Lemma 1 1 and 
get an upgraded channel Q' with an alphabet size smaller by 2 
than that of W. The same process is applied to Q', until the out- 
put alphabet size is not more than ji. As before, assuming the 
output alphabet size of W is at most 2f/^, an implementation 
similar to that given in Algorithm [c| will run in 0{j-i^log}i) 
time. 

As was the case for degrading, the following theorem proves 
that no generality is lost by only considering merging of con- 
secutive triplets of the form y,, y/+i, y,+2 in the main step. The 
proof is given in Appendix [B] 

Theorem 13: Let W : A" ^ 3^ be a BMS channel. De- 
note by lyy the capacity of W and by I {y\,y2> 1/3) the capacity 
one gets by the application of Lemma [TT] to W and symbols 
yi/ 1/2/ 3/3 G y such that 

1 ^ LR(yi) ^ LR(y2) ^ LR(y3) . 

LetLR(yi) = Ai,LR(y2) = A2,LR(y3) = A3, 712 = W(y2|0) 
W(y2|l), and denote the difference in capacities as 

A[Ai;A2,7T2;A3] = I(yi,y2,y3) - Iw ■ 

Then, for all A[ ^ Aj and A3 ^ A3, 

A[Ai;A2,7r2;A3] ^ A[Ai; A2, 712; A^] . (30) 

We end this section by considering the running time of Algo- 
rithms |A] and |B] 

Theorem 14: Let an underlying BMS channel W, a fidelity 
parameter ji, and codelength n = 2™ be given. Assume that 
the output alphabet size of the underlying channel W is at most 
}i. The running time of either Algorithm [A] or Algorithm [B] 
is as follows. Approximating a single bit-channel takes 0{m ■ 
ji^logji) time; approximating all n bit-channels takes 0{n ■ 
f/^log/i) time. 

Proof: Without loss of generality, we consider AlgorithmjA] 
Recall that the output alphabet size of W is at most pi. Thus, by 
induction, at the start of each loop the size of the output alpha- 
bet of Q is at most Therefore, at each iteration, calculating 
W from Q takes 0(^^) time, since the output alphabet size 
of W is at most 2}p-. Next, we have seen that each invocation 
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of degrading_merge takes 0(/^'^log/i) time. The number 
of times we loop in Algorithm [A] is m. Thus, for a single bit- 
channel, the total running time is 0{m ■ fp- log }i). 

As was explained at the end of Section [rV] when approximat- 
ing all n bit channels, the number of distinct channels that need 
to be approximated is 2n — 2. Thus, the total running time in 
this case is 0(n ■ }p- log ■ 

VI. Channels with Continuous Output Alphabet 

Recall that in order to apply either Algorithm [A] or [B] to an 
underlying BMS channel W, we had to thus far assume that W 
has a finite output alphabet. In this section, we show two trans- 
forms (one degrading and the other upgrading) that transform 
a BMS channel with a continuous alphabet to a BMS channel 
with a specified finite output alphabet size. Thus, after apply- 
ing the degrading (upgrading) transform we will shortly spec- 
ify to W, we will be in a position to apply Algorithm|A](jB]) and 
get a degraded (upgraded) approximation of W, . Moreover, we 
prove that both degraded and upgraded versions of our original 
channels have a guaranteed closeness to the original channel, in 
terms of difference of capacity. 

Let W be a given BMS channel with a continuous alphabet. 
We will make a few assumptions on W. First, we assume that 
the output alphabet of W is the reals IR. Thus, for y £ IR, let 
/(yjO) and /(y|l) be the p.d.f. functions of the output given 
that the input was and 1, respectively. Next, we require that 
the symmetry of W manifest itself as 

/(y|0) =/(-y|l), forallyelR. 



Also, for notational convenience, we require that 
/(y|0) ^/(y|l), for ally ^0. 



(31) 



Note that all of the above holds for the BAWGN channel (after 
renaming the input as —1). 

We now introduce some notation. For y ^ 0, define the like- 
lihood ratio of y as 



A(y) 



/(y|o) 
/(y|i) 



(32) 



As usual, if /(y|l) is zero while /(y|0) is not, we define A(y) = 
00. Also, if both/(y|0) and /(y|l) are zero, then we arbitrarily 
define A(y) = 1. Note that by (BT}, we have that A(y) ^ 1. 

Under these definitions, a short calculation shows that the ca- 
pacity of W is 



l[W) 







(/(y|0)+/(y|l))C[A(y)] dy, 



where for 1 ^ A < oo 
A 



C[A] 



1 



A 



A 



-log2(A + l) , 



and (for continuity) we define C[oo] = 1. 

Let ji = 2v be the specified size of the degraded/upgraded 
channel output alphabet. An important property of C [A] is that 
it is strictly increasing in A for A ^ 1. This property is easily 
proved, and will now be used to show that the following sets 



form a partition of the non-negative reals. For 1 ^ ; ^ i/ — 1, 
let 

A— jyJsO: ^ ^C[A(y)] < ^1 . (33) 

For f = V we similarly define (changing the second inequality 
to a weak inequality) 

jy^O: ^ ^C[A(y)] . (34) 

As we will see later on, we must assume that the sets A; are 
sufficiently "nice". This will indeed be the case of for BAWGN 
channel. 



A. Degrading transform 

Essentially, our degrading procedure will consist of v appli- 
cations of the continuous analog of Lemma |7] Denote by Q : 
X ^ Z the degraded approximation of W we are going to 
produce, where 

Z = {zx,Z\,Z2,Z2, ■ ■ ■ ,Zy,Zy] . 

We define Q as follows. 

Q(z,|0) = S(f,-|l) = / /(y|0)dy, (35) 
S(z,|0) = S(z,-|l) = / /(-y|0)dy. (36) 

Lemma 15: The channel Q : X ^ Z \s a. BMS channel 
such that Q ^ W. 

Proof: It is readily seen that Q is a BMS channel. To prove 
Q ^ W, we now supply intermediate channel V ^ Z. 

1 if z = Z/ and y e A, , 
P(z|y) = \ l if z = z/ and -y e A, , 
otherwise . 



The following lemma bounds the loss in capacity incurred by 
the degrading operation. 

Lemma 16: The difference in capacities of Q and W can be 
bounded as follows. 



^ 7(W) ^ 



1 _ 2 

V }l 



(37) 



Proof: The first inequality in ( 37 i is a consequence of the 



degrading relation and ( 17i. We now turn our attention to the 
second inequality. 

Recall that since the A, partition the non-negative reals, the 
capacity of W equals 



m) = E / (/(y|0) + /(y|l)) C[A(y)]dy . (38) 

As for Q, we start by defining for 1 ^ f ^ i/ the ratio 

. _ Q(z,-|0) 
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where the cases of the numerator and/or denominator equaling 
zero are as in the definition of A(i/). By this definition, similarly 
to the continuous case, the capacity of Q is equal to 



Then, 



I(S) = E (2(2,10) + Q(z,|l))C[0,-: 

i=l 



(39) 



Recall that by the definition of A, in ( 33 1 and ( 34 1, we have 
that for all y e A;, 

^ ^ C[A(y)] ^ i . 



Thus, by the definition of Q(z, |0) and S(Zj|l) in \35\ and \36}, 
respectively, we must have by the strict monotonicity of C that 

— ^C[0,]^-, if S(z,|0) >0. 

V V 

Thus, for all y e A,, 

\C[e,]-C[A{y)]\ ^ ^, ifS(2,|0) >0. 

Next, note that Q{zi\0) > implies Q(z,|0) + Q(z,|l) > 0. 
Thus, we may bound I{Q) as follows, 

I(Q) = E(Q(z,-|O) + Q(z,-|l))C[0,] = 



E /. (/(y|O)+/(y|i))C[0,]dy^ 



E/^,(/(y|0)+/(y|l))(c[A(y)]-J)dy 



E/^_(/(y|o)+/(y|i))c[A(y)]dyj-l 



\i=l 



I(W) 



which proves the second inequality. 



B. Upgrading transform 

In parallel with the degrading case, our upgrading procedure 
will essentially consist of v applications of the continuous ana- 
log of Lemma |9] Denote hy Q' : X ^ Z' the upgraded ap- 
proximation of W we are going to produce, where 

Z' = {Zi,Zi,Z2,Z2, ■ ■ ■ ,Zy,Zi,} . 

As before, we will show that the loss in capacity due to the up- 
grading operation is at most 1/v. 

Let us now redefine the ratio 0,. Recalling that the function 
C[A] is strictly increasing in A ^ 1, we deduce that it has an 
inverse in that range. Thus, for 1 ^ f ^ v, we define 0, ^ 1 as 
follows. 



(40) 



Note that for i = v, we have that dp = oo. Also, note that for 



y e Aj we have by (|33]) and Q that 
1 ^ A(y) ^ 0, . 
We now define Q' . For 1 ^ / ^ v, let, 

{f{a\0)+f{-^\0))da. 



(41) 



(42) 







if z = Zj and 6j ^ co , 
if z = Z/ and 9j ^ oo , 
if z = z, and 0,- = oo , 
if z = z, and 0,- = oo , 



(43) 



and 



S'(z,|l) = S'(z,|0) , S'(z,|l) = S'(z,-|0) . (44) 



Lemma 17: The channel Q' : X ^ Z' is a BMS channel 
such that Q' )>= W. 

Proof: As before, the proof that Q' is a BMS channel is 
easy. To show that Q' ;>= W, we must supply the intermediate 
channel V. The proof follows easily if we define V : Z' ^ M. 
as the cascade of two channels, Vi : Z' ^ M. and V2 '■ ^ ^ 
U. 

The channel Vi : Z' ^ ^is essentially a renaming channel. 
Denote by g^(a|z) the p.d.f. of the output of Vi given that the 
input was z. Then, for 1 ^ i ^ v, 



g{oc\z) 



r /(«io)+/(-«io) 

/(^|0)+/(-«|0) 




if z = Zj and a E Aj , 

if z = Zj and -a e Aj , (45) 

otherwise . 



Note that by (42 1, the function g(a|z) is indeed a p.d.f. for ev- 

]R, the LR reducing channel. Let 



ery fixed value of z G Z' . 

Next, we turn XaV^'-^ 
a. G Aj and recall the definition of A(y) given in (32 1. Define 
the quantity as follows. 



e,-A(«) 
(A(»)+i)(e,-i) 
1 
2 

1 

A{a)+1 





ifl < 0, < 00, 

if0/ = 1, 

if Qj = 00 and A(a) < 00 , 
if A(a) = 00 . 



(46) 



By (41 1 with a in place of y we deduce that ^ pa ^ 1/2. We 



define the channel ■ ^ follows. For y ^ 0, 



p2(yk) 




-a , 



(47) 



and 



P2(-y|-a)=p2(yk) 



Consider the random variable Y, which is defined as the output 
of the concatenation of channels Q', V\, and V2, given that the 
input to Q! was 0. We must show that the p.d.f. of Y is /(y|0). 
To do this, we consider the limit 



lim 

e->0 



Prob(y ^ Y ^ y + e) 



Consider first a y such that y £ A/, and assume further that e is 
small enough so that the whole interval between y and y + e is 
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in A/. In this case, the above can be expanded to 

ry+£ 



lim - ^ 

e^O e 



S'(z,|0) 



Q'im 



y-e 



Assuming that the two integrands are indeed integrable, this re- 
duces to 

S'(z,|0) ■ g{y\zi){l - py) + Q'(z-,|0) ■ g{-y\zOpy . 

From here, simple calculations indeed reduce the above to/(i/|0) 
The other cases are similar ■ 

As in the degrading case, we can bound the loss in capacity 
incurred by the upgrading operation. 

Lemma 18: The difference in capacities of Q' and W can be 
bounded as follows, 

^ I(S') -^(W) ^ ^ = ^ . (48) 

V }l 



Proof: The first inequality in ( 48 i is a consequence of the 
upgrading relation and ([TTji. We now turn our attention to the 
second inequality. 



For all y E Ai, by 03b, 04J) and (40 1, we have that 



C[0,]-C[A(y)] ^ J, ifQ'(z,|0)>0 



ifS'(z,|0) >0. 



Next, notice that by ( [43] l, 

, ^ Q'{z,\0) 

As in the degrading case, we have that Q'(z/ 10) > im- 
plies S'(z,|0) + S'(z,|l) > 0. Thus, we may bound 7 (S') as 
follows, 

I(S') = E(S'(zdO) + Q'(z,|l))C[0,] = 

t A {f{y\o)+f{y\i))mdy^ 



' (/(y|o)+/(y|i))fc[A(y)] + ^ 

A: \ V 



E 

i=l 



(^Ey^,(/(3/|0)+/(y|i))c[A(y)]rfy 

which proves the second inequality. 



dy = 
J(W)- 



VII. Variations of Our Algorithms 

As one might expect. Algorithms |A] and |B] can be tweaked 
and modified. As an example, we now show an improvement to 
Algorithm[A|for a specific case. As we will see in Section VIII 



this improvement is key to proving Theorem[T] Also, it turns out 
that Algorithm [A] is compatible with the result by Guruswami 
and Xia 1 1 1 1, in the following sense: if we were to use algorithm 
Algorithm |A] with the same n and }i dictated by pT) , then we 
would be guaranteed a resulting code with parameters as least 
as good as those promised by |llj. 

Recall our description of how to construct a polar code given 



each bit channel through the use of Algorithm |Aj and then se- 
lect the k best channels when ordered according to the upper 
bound on the probability of error Note that Algorithm [A]returns 
a channel, but in this case only one attribute of that channel in- 
terests us, namely, the probability of error. In this section, we 
show how to specialize Algorithm [A] accordingly and benefit. 

The specialized algorithm is given as Algorithm |D] We note 
that the plots in this paper having to do with an upper bound 
on the probability of error were produced by running this al- 
gorithm. The key observation follows from Equations (26) and 
(27) in |3l, which we now restate. Recall that Z(W) is the Bhat- 
tacharyya parameter of the channel W. Then, 

Z{W a W) ^ 2Z(W) - Z(W)2 



z{w®w) = z{wy 



(49) 
(50) 



Algorithm D: An upper bound on the error probability 

input : An underlying BMS channel W, a bound pi = Iv on the 
output alphabet size, a code length n = 2"\ an index i 
with binary representation / = (bj, b2, ■ ■ ■ , bm)2- 

output : An upper bound on Pf (W, ). 

1 Z^Z(W) 

2 Q degrading_merge(W, ^i) 

3 for = 1, 2, . . . , m do 

4 

5 
6 
7 
8 
9 

10 



if bj = then 

W 

else 



min{Z(>V),2Z- 



Z^} 



- 

Z2 



Q <— degrading_merge(W, ^) 



11 return min{Pp( Q),Z} 



Theorem 19: Let a codeword length n = 2"', an index ^ 
i < n, an underlying channel W, and a fideUty parameter ja = 
2v be given. Denote by and pu the outputs of Algorithms [A| 
and|Dj respectively. Then, 

That is, the bound produced by Algorithm [Pjis always as least 
as good as that produced by Algorithm [A| 

Proof: Denote by W'^-' the channel we are trying to ap- 
proximate during iteration j. That is, we start with W^^) = W. 
Then, iteratively >vO+l) is gotten by transforming W^^^ using 
either ffl or ®, according to the value of bj. Ultimately, we have 
W(™), which is simply the bit-channel W/. 

The heart of the proof is to show that after iteration j has 
completed (just after line 10 has executed), the variable Z is 
such that 



at the end of Section IV obtain a degraded approximation of 



z(w(^')) ^ Z ^ 1 . 

The proof is by induction. For the basis, note that before the 
first iteration starts (just after line[2]has executed), we have Z = 
Z(>V(0) ). For the induction step, first note that 2Z - Z^ is both 
an increasing function of Z and is between and 1, when ^ 
Z ^ 1. Obviously, this is also true for Z^. Now, note that at the 
end of iteration j we have that the variable W is degraded with 
respect to W^>\ Recalling (16i, (49 1 and (50 1, the induction 
step is proved. ■ 
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f/ = 8 
f/ = 16 
= 64 
^ = 128 
^ = 256 
}i = 512 



Algoiithm[A| 
5.096030e-TO 
6.926762e-05 
1.808362e-06 
1.142843e-06 
1.023423e-06 



AlgorithmlD] 
1.139075e-W 
2.695 836e-05 
1.801289e-06 
1.142151e-06 
1.023423e-06 
9.999497e-07 



AlgorithmlB] 
1.601266e-Tr 
4.296030e-08 
7.362648e-07 
8.943 154e-07 
9.382042e-07 
9.41 754 le-07 



TABLE I 

Upper and lower bounds on Pv/,n{k) for W = BSC(O.ll), 
CODEWORD LENGTH n = 2^", AND rate'/c/jj = 445340/2^" = 0.42471. 



We end this section by referring to Table |l] In the table, we 
fix the underlying channel, the codeword length, and the code 
rate. Then, we compare upper and lower bounds on i-W,«(^)' 
for various values of ji. For a given }i, the lower bound is gotten 
by running Algorithm |B] while the two upper bounds are got- 
ten by running Algorithms [A] and [D] As can be seen, the upper 
bound supplied by Algorithm|D]is always superior. 

VIII. Analysis 

As we've seen in previous sections, we can build polar codes 
by employing AlgorithmjDj and gauge how far we are from the 
optimal construction by running Algorithm |B] As can be seen 
in Figure |2] our construction turns out to be essentially opti- 
mal, for moderate sizes of ji. However, we are still to prove 
Theorem[T] which gives analytic justification to our method of 
construction. We do so in this section. 

As background to Theorem [T] recall from |j5) that for a po- 
lar code of length n =2™, the fraction of bit channels with 
probability of error less than 2^"^ tends to the capacity of the 
underlying channel as n goes to infinity, for j6 < 1/2. More- 
over, the constraint j6 < 1 / 2 is tight in that the fraction of such 
channels is strictly less than the capacity, for fi > 1/2. Thus, in 
this context, the restriction on j6 imposed by Theorem [T| cannot 
be eased. 

In order to prove Theorem [T] we make use of the results of 
Pedarsani, Hassani, Tal, and Telatar 1 19], in particular 1 19 The- 



orem 1] given below. We also point out that many ideas used in 
the proof of Theorem [T] appear — in one form or another — in 
[ [T9l Theorem 2] and its proof. 

Theorem 20 (Restatement ofp9\ Theorem 1]): Let an under- 
lying BMS channel W be given. Let n = 2"' be the code length, 
and denote by W^^™' the corresponding fth bit channel, where 
^ f < M. Next, denote by Q^'"\v) the degraded approxima- 
tion of wj"'"^ returned by running Algorithm A with parameters 
W, }i = 2v, i, and m. Then, 



{f:I(wW)-I(S|"')(v));^y^} 



introduced in Subsection |V-A| Thus, it follows easily that The- 
orem |20] and thus Theorem [T| would still hold had we used that 
alternative. 

We now break the proof of Theorem [T] into several lemmas. 
Put simply, the first lemma states that a laxer requirement than 
that in Theorem[T]on the probability of error can be met. 

Then, for ev- 
large enough 



Lemma 21: Let Q - (i/) be as in Theorem 



20 



ery <5 > and e > there exists an niQ anda 
}i = 2v such that 



{^o:Z(qJ7'(.))^^} 



"0 



^I(W)-e, (51) 



where 



no = 2'"" and ^ zq < "0 ■ 



We first note that Lemma |2T| has a trivial proof: By fj' The- 
orem 2], we know that there exists an otq for which ( |5T| ) holds. 



if Q 



(mo)/ 



' [v) is replaced by y\^^^^°\ Thus, we may take ji large 
enough so that the pair-merging operation defined in Lemma |7] 



is never executed, and so Q^'""^ (v) is in fact equal to W'' 

This proof — although valid — implies a value of }i which 
is doubly exponential in otq. We now give an alternative proof, 
which — as we have recently learned — is a precursor to the re- 
sult of Guruswami and Xia |1 1| . Namely, we state this alterna- 
tive proof since we have previously conjectured and now know 
by [1 IJ that it implies a value of niQ which is not too large. 

For simplicity of notation, let us drop 



;{"'o) 



proof of Lemma 21 



the subscript from zq, no, and wiq. Recall that by fT, The- 
orem 1] we have that the capacity of bit channels polarizes. 
Specifically, for each > and > there exists an m such 
that 



n 



^ I{W) - ei . (52) 



We can now combine the above with Theorem |20] and deduce 
that 



I(W) -ei 



m 

-. (53) 



Next, we claim that for each ^2 > and £2 > there exist 
m and }i = 2v such that 



{/:/(si""(v))^l-^2} 



^I(W)-e2- (54) 



To see this, take £\ = £2/2, 61 = 82/2, and let m be the guar- 
anteed constant such that ([52]i holds. Now, we can take v big 



With respect to the above, we remark the following. Recall enough so that, in the context of (53 1, we have that both 



that in Subsection VI-A we introduced a method of degrading 
a continuous channel to a discrete one with at most = 2v 
symbols. In fact, there is nothing special about the continuous 
case: a slight modification can be used to degrade an arbitrary 
discrete channel to a discrete channel with at most }i symbols. 
Thus, we have an alternative to the merge-degrading method 



and 



< 62 



m 

- < £2 
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By (3] Equation (2)] we have that 



Thus, if d54b holds then 



^ i(W) - £2 



So, as before, we deduce that for every > and £3 > there 
exist m and fi such that 



{f:Z(Q("')(v))^^3} 



^ I(W) - £3 . 



The next lemma will be used later to bound the evolution of 
the variable Z in Algorithm [P] 

Lemma 22: For every m ^ and index ^ z < 2'" let there 
be a corresponding real ^ l,{i,m) ^ 1. Denote the binary 
representation of i by i = {h\, b2, ■ ■ ■ , t'm)2- Assume that the 
l,{i,m) satisfy the following recursive relation. For m > and 



2ai',m-l)-^^i',m-l) ifb, 
e{i',m-l) 



0, 



otherwise 



(55) 



Then, for every ^ < 1 / 2 we have that 



lim inf 



^l-aO,0), (56) 



where n = 2'". 

Proof: First, note that both fi 



^ strictly increase from to 1 when ^ ranges from to 1. Thus, 
it suffices to prove the claim for the worst case in which the 
inequality in ( [55] l is replaced by an equality. Assume from now 
on that this is indeed the case. 

Consider an underlying BEC with probability of erasure (as 
well as Bhattacharyya parameter) ^(0,0). Next, note that the 
fth bit channel, for ^ f < n = 2"', is also a BEC, with prob- 
ability of erasure l,{i,m). Since the capacity of the underlying 
BEC is 1 - ^(0, 0), we deduce ^ by \5. Theorem 2]. ■ 
We are now in a position to prove Theorem[T] 

Proof of Theorem [T| Let us first specify explicitly the 
code construction algorithm used, and then analyze it. As ex- 
pected, we simply run Algorithm |D] with parameters W and 
n to produce upper bounds on the probability of error of all n 
bit channels. Then, we sort the upper bounds in ascending or- 
der Finally, we produce a generator matrix G, with k rows. The 
rows of G correspond to the first k bit channels according to the 
sorted order, and k is the largest integer such that the sum of up- 

the total 
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per bounds is strictly less than 2" . By Theorem 
running time is indeed 0{n ■ jp- log }i). 

Recall our definition of W.'^'"' and q|'"^ from Theorem 20 
Denote the upper bound on the probability of error retumecTBy 
Algorithm D for bit channel / by Pe(W,. fi). The theorem 
will follow easily once we prove that for all e > and < 



/3 < 1 /2 there exists an even }Iq such that for all \i = 2v \iq 
we have 



lim inf 

m— >oo 



{f:i^(w/'"\F)<2-"''} 



^ I(W) - e . (57) 



By Lemma [2T| there exist constants otq and v such that 

{^:Z(qS7'(.))^|} 



"0 



^ -f(W) 



e 
2 ' 



(58) 



where 



no = 2™" and ^ fo < mq . 



Denote the codeword length as n = 2™, where m = ntQ + 
nil '^1 > 0- Consider an index ^ f < n having binary 
representation 

i = {b-[, b2,-.., bmg, b^Q+i, ■ ■ ■ , bm)2 / 

where bi is the most significant bit. We split the run of Algo- 
rithm |D] on i into two stages. The first stage will have j going 
from 1 to niQ, while the second stage will have j going from 
nzQ + 1 to m. 

We start by considering the end of the first stage. Namely, we 
are at iteration j = niQ and line 10 has just finished executing. 



Recall that we denote the value of the variable Q after the line 
has executed by Q\^°^ (v), where 

^0 = {bl,b2,---,bmo)2 ■ 

Similarly, define zj'""' (t/) as the value of the variable Z at that 



point. Since, by ( 16 1, degrading increases the Bhattacharyya pa- 
rameter, we have then that the Bhattacharyya parameter of the 
variable W is less than or equal to that of the variable Q. So, 
by the minimization carried out in either line |6] or |9] we con- 
clude the following: at the end of line[TO]of the algorithm, when 
i = mo. 



7(.'"0) 



(y) ^ z (s| 



mo) 



We can combine this observation with (|58 



(v)) =Z(S). 

to conclude that 



{^o:Zj7)(v)^f} 



no 



^ i(W) - - . 



(59) 



We now move on to consider the second stage of the algo- 
rithm. Fix an index i = {bi, b2, ■ ■ ■ , b^^, b„,g^i, . . ., bjn)2- That 
is, let i have fg as a binary prefix of length otq. Denote by Z[f] 
the value of Z at the end of line 10 when j = nto + t. By lines 
|6]and|9]of the algorithm we have, similarly to ( [55] l, that 



2Z[t]-Z^[t] ifb„,^+t+i=0, 
Z^[t] otherwise . 



We now combine our observations about the two stages. Let 
7 be a constant such that 



Considering ( 59 1, we see that out of the uq = 2"'" possible pre- 
fixes of length ntQ, the fraction for which 

e 



Z[0]^ 



(60) 
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is at least I(W) — |. Next, by Lemma 22 we see that for each 
such prefix, the fraction of suffixes for which 



Z[mi] ^ 2- 



(61) 



is at least 1 — Z[0], as Wi = 2"'! tends to infinity. Thus, for 



each such prefix, we get by ( 60 1 that (in the limit) the fraction 
of such suffixes is at least 1 — | • We can now put all our bounds 
together and claim that as nii tends to infinity, the fraction of 
indices <i <2" 



(^(w) - 1) 



for which (|61|) holds is at least 
^ I(W) - e . 



1 



bound on the return value Pe(VV, 



(m) 



By Une ( 11 1 of Algorithm [pj we see that Z[mi] is an upper 

). Thus, we conclude that 



lim inf 



[i:P,iwl'"\}i) <2 



7(W) 



With the above at hand, the only thing left to do in order to 



prove ( 57 1 is to show that for nti large enough we have that 

2-i'n]'' ^ 2""'' , 
which reduces to showing that 

Since 7 > /3 and uq = 2'"" is constant, this is indeed the case. 

■ 

We end this section by pointing out a similarity between the 
analysis used here and the analysis carried out in fT2\. In both 
papers, there are two stages. The first stage (prefix of length otq 
in our paper) makes full use of the conditional probability dis- 
tribution of the channel, while the second stage uses a simpler 
rule (evolving the bound on the Bhattacharyya parameter in our 
paper and using an RM rule in p2)). 



respectively. The capacity difference resulting from the appli- 
cation of Lemma [T] to Wi and W2 is denoted by 

A{ai,bi;a2,b2) = C(fli, bi) +C(fl2, ^2) - C(fli +fl2, f'l + ^^2) 

For reasons that will become apparent later on, we henceforth 
relax the definition of a probability pair to two non-negative 
numbers, the sum of which may be greater than 1. Note that 
C{a,b) is still well defined with respect to this generalization, 
as is A(fli, bi;a2, t'2)- Furthermore, to exclude trivial cases, we 
require that a probability pair {a,b) has at least one positive 
element. 

The following lemma states that we lose capacity by per- 
forming a downgrading merge. 

Lemma 23: Let (flj, bi ) and (fl2/ ^^2) be two probability pairs. 
Then, 

A{ai,bi;a2,b2) ^ 

Proof: Assume first that fli,i'i,fl2/^'2 are all positive. In 
this case, A(fli, bi;a2, ^'2) can be written as follows: 



(fll +fl2)| 



(ai+i'l)(ai+a2) 



log. 



+ 



fli+a2 



fl2_lp,o. (a2+b2)(«l+a2) 1 I 
^"&2 a2(fli+bi+fl2+fc2) ' 



{h + h2)\ 



-h 1(^0- (ai+bi)(bi+b2) 
h+h °2 bi{ai+bi+a2+b2) 



2 b2(ai+bi+a2+b2) j 



Appendix A 
Proof of TheoremU] 

This appendix is devoted to the proof of Theorem |8] Al 
though the initial lemmas needed for the proof are rather intu 
itive, the latter seem to be a lucky coincidence (probably due 
to a lack of a deeper understanding on the authors' part). The 
prime example seems to be Equation (|69]l in the proof of Lemma|27j- j'^^ -j 

We start by defining some notation. Let W : X ^ y, v, 
yifyir- ■ -/i/i' and yi>y2>yv be as in Theorem [8] Let w (z y 
and w (z y a. symbol pair, and denote by [a, b) the corre- 
sponding probability pair, where 



By Jensen's inequality, both the first two lines and the last two 
lines can be lower bounded be 0. The proof for cases in which 
some of the variables equal zero is much the same. ■ 
The intuition behind the following lemma is that the order of 
merging does matter in terms of total capacity lost. 

Lemma 24: Let (flj, bi), {a2, b2), and (^3, ^3) be three prob- 
ability pairs. Then, 



A{ai,bi;a2,b2) +A{ai+a2,bi + b2;a^,bo,) = 

A(fl2, b2;aj, b^) + A(fli, bi; fl2 + a^, b2 + bj) . 
Proof: Both sides of the equation equal 

C(fl2, ill) + C{a^, ^3) - C(fli + fl2 + ^3/ ^1+^2 + ^3) 



a = p{w\0) = p{w\\) , b = p{w\\) = p{w\0) . 

The contribution of this probability pair to the capacity of W is 
denoted by 

C{a,b) = -ia + b)log2i{a + b)/2)+alog2{a)+blog2{b) 
- {a + b)lo§2{a + b) + alog^ia) + blog2{l>) + ia + b) , 

where log2 = 0. 

Next, suppose we are given two probability pairs: {ai,bi) 
and (fl2/ b2) corresponding to the symbol pair zvi, W\ and W2, IV2, 



Instead of working with a probability pair {a,b), we find it 
easier to work with a probability sum n = a -\- b and likeli- 
hood ratio X = a/b. Of course, we can go back to our previous 
representation as follows. If A = 00 then a = n and b = 0. 
Otherwise, a = and b = . Recall that our relaxation 
of the term "probability pair" implies that tt is positive and it 
may be greater than 1. 

_ Abusing notation, we define the quantity C through A and n 
as well. For A = 00 we have C[oo, tt] = tt. Otherwise, 



71 



C[\,n] = 
^ 1 



A + 



-log2(l+A) 
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Let us next consider merging operations. The merging of 
the symbol pair corresponding to [Ai, tti] with that of [A2, 712] 
gives a symbol pair with [Aj 2/ 2]. where 



If < Ai < 00, then 
A[Ai, 7ri;A2,oo] = 



7^1 2 = 'Tl + 712 



and 




Ai + 



Ai + 1' 
A2 + I 



(64) 



-^1,2 = A[7ri,Ai;7T2,A2] = 

Ai7ri(A2 ■ 



■1) +A27r2(Ai +1) 



(62) 



7ri(A2 + l) + 7r2(Ai + l) 
Abusing notation, define 

A[Ai,7Ti;A2, 712] = C[Ai, 7Ti] + C[A2, 712] - C[Ai^2, ^1,2] ■ 

Clearly, we have that the new definition of A is symmetric: 

A[Ai,7Ti;A2, 712] = A[A2,7r2;Ai, TTi] . (63) 

Lemma 25: A [Ai, 7Ti; A2, 7T2] is monotonic increasing in both 
TTi and TZ2- 

Proof: Recall from (63 1 that A is symmetric, and so it suf- 
fices to prove the claim for rci. Thus, our goal is to prove the 
following for all p > 0, 

A[Ai,7ri +p;A2, 712] ^ A[Ai, tti; A2, 712] . 

At this point, we find it useful to convert back from the like- 
lihood ratio/probability sum representation [A, n] to the prob- 
ability pair representation {a,b). Denote by {ai,bi), (fl2/^2)> 
and (fl', [7') the probability pairs corresponding to [Ai,7ri], [A2, 7T2 
and [Ai, tti + p], respectively. Let = a' — ai and hj, = 
b' — h\. Next, since both {ai,hi) and {a',b') have the same 
likelihood ratio, we deduce that both and are non-negative. 
Under our new notation, we must prove that 

A(fli +fl3,&i +by,a2,b2) ^ A(fli,&i;fl2,&2) ■ 

Since both {ai,bi) and {a',b') have likelihood ratio Ai, this 
is also the case for {a^,b^). Thus, a simple calculation shows 
that 

A(fli,fci;fl3,fc3) = . 

Hence, 

A(fli +fl3,bi + by,a2,b2) = 

A(fli +a3,bi +b3;a2,b2) + Aiai,bi;a3, bj) 

Next, by Lemma [24) 

A(fli +fl3,&i + by,a2,b2) + A(fli, &i;fl3, ^3) = 

A (fli, fl2, ^2 ) + A (fli + fl2, h + h; a^, h ) • 



If Ai = 00, then 

A[Ai,7Ti;A2,oo] = Til 

If Ai = 0, then 

A[Ai, 7ri;A2,oo] = tti 




log2 



A2 + I 



(65) 



(66) 



Proof: Consider first the case < Aj < 00. We write out 
A [Aj, TTi; A2, 712] in full and after rearrangement get 



1 



1 

^\,2 



log2 1 + 



1 

Al,2 



Ai 



log2 1 + 



Ai 



^nA^^°§2(l+Au)-^log2 



I + A1 



+ 



Tl2 



1^2 



1 

A 1,2 



log2 1 + 



A 



1,2 



A2 



log2 



1 

Y2 



A 



1,2 ■ 



- log2 (1 + Ai,2) - log2 (1 + A2) 



(67) 



where Ai 2 is given in (62 1. Next, note that 
lim Ai 2 = A2 . 

772— >00 ' 

Thus, applying lim 7^2^00 to the first two lines of ( 67 1 is straight- 
forward. Next, consider the third line of ( |67] i, and write its limit 

as 

1 



lim 

772— >oc 



^l0g2(l + At^)-^l0g2(l 

Aj_2 ''2 



J_ 

7T2 



Since lim7r2^oo Ai 2 = A2, we get that both numerator and de- 
nominator tend to as 712 ~^ Thus, we apply I'Hopital's 
rule and get 



lim 



(log2e-log2 (1 



11,2 



3A 



yi 

d7Z2 



7l2~ 



1 



Since, by Lemma 23 we have that A(fli + a2, bi + b2;a3, b^) 
is non-negative, we are done. ■ 

We are now at the point in which our relaxation of the term 
"probability pair" can be put to good use. Namely, we will now 
see how to reduce the number of variables involved by one, by 
taking a certain probability sum to infinity. 

Lemma 26: Let A^, ttj, and A2 be given. Assume that < 
A2 < 00. Define 

A[Ai, 7ri; A2, 00] = lim AfAj, tti; A2, 712] . 

7r2— >co 



7ri(A2 + l)(Ai 



I^(l0g2e-l0g2 (1 + ^ 

1)(A2-Ai) _ 



^ 7ri(Ai+l) + 7r2(Ai+l) -|2 



7ri(A2 - Ai) 



log2 e - log2 1 



1 

Y2 



(Ai+1)(A2 + 1) 

where e = 2.71828 ... is Euler's number Similarly, taking the 
limy; 



in^-^oa of the fourth line of ( 67 1 gives 

7Ti(A2-Ai) 

(Ai+1)(A2 + 1) 



log2e-log2(l + A2)) . 
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Thus, a short calculations finishes the proof for this case. The 
cases Ai = 00 and Ai = are handled much the same way. ■ 
The utility of the next Lemma is that it asserts a stronger 
claim than the "Moreover" part of Theorem |8] for a specific 
value of A2. 

Lemma 27: Let probability pairs {ai,hi) and (^3, ^3) have 
likelihood ratios Aj and A3, respectively. Assume A^ ^ A3. 
Denote ttj = + and 713 = ^3 + hj. Let 

^2 = -^1,3 = '^[t^i, -^i; 7^3/ / (68) 
as defined in ( [62] i. Then, 

A[Ai, 7ri;A2,oo] ^ A[Ai, tti; A3, 713] 

and 

A [A3, 713; A2, 00] ^ A[Ai,7Ti;A3, 713] 

Proof: We start by taking care of a trivial case. Note that if 
it is not the case that < A2 < 00, then Aj = A2 = A3, and 
the proof follows easily. 

So, we henceforth assume that < A2 < 00, as was done in 
Lemma |26] Let 



A 



(13) 



a; 



(1-2) 



and 



a; 



(2,3) 



A[Ai, 7Ti;A3, 713] , 
= A[Ai, 7ri;A2,oo] , 

= A [A3, 713; A2, 00] . 



Thus, we must prove that A'^^ ^ A^^^ 3-) and Apgj ^ Aj-^^ -^^y 



(69) 



Luckily, Lemma 26 and a bit of calculation yields that 

^(1,2) +'^(2,3) = '^(1-3) ■ 

Recall that Aj -^ 2) ™d A|2 3^ must be non-negative by Lem- 
mas[23]and[25] Thus, we are done. ■ 
The next lemma shows how to discard the restraint put on A2 
in Lemma I27] 

Lemma 28: Let the likelihood ratios A^, A3 and the proba- 



bility sums Til, TL^ be as in be as in Lemma 27 Fix 
Ax ^ A2 ^ A3 . 

Then either 

A[Ai, 7ri;A2,oo] ^ A[Ai, tti; A3, 713] 

or 

A [A3, 713; A2, 00] ^ A[Ai,7Ti;A3, 713] 



(70) 

(71) 
(72) 



Proof: Let Ai 3 be as in ( 68 1, and note that 



Ai ^ A13 ^ A3 . 
Assume w.l.o.g. that A2 is such that 
Ax ^ A2 ^ Aj 3 . 
From Lemma |27l we have that 

A[Ai,7ri;Ai^3,oo] ^ A[Ai, 7ri; A3, 713] 
Thus, we may assume that A2 < A^ 3 and aim to prove that 

A[Ai, 7ri;A2,oo] ^ A[Ai, 7ri; Ai3,oo] . (73) 



Next, notice that 

A [Ai, 7ri; A2, 00] = , if A2 
Thus, let us assume that 

Ai < A2 < Ai 3 . 
Specifically, it follows that 

< A2 < 00 



A 



1 ■ 



(74) 



and thus the assumption in Lemma 26 holds. 
Define the function / as follows 

/(A^) = A[Ai,7Ti;A^,oo] . 



Assume first that A 



0, and thus by ( 66 1 we have that 

a/(A^) 



3a; 



> . 



On the other hand, if Aj 7^ we must have that ^ Ai < 00. 
Thus, by (j64]i we have that 

^ TTi / _ Ai - 

aAl (Ai +1)(A1 + 1) I Al 



.2 (Ai + 1)(A^ + 1) -2. 

which is also non-negative for Aj ^ Ai. Thus, we have proved 
that the derivative is non-negative in both cases, and this to- 



gether with ( 74 1 proves ( 73 1 



We are now in a position to prove TheoremjS] 

Proof of Theorerr^^ We first consider the "Moreover" part 
of the theorem. Let [Ai, tti], [A2, 712], and [Ai, 7ri] correspond 
to y,, i/p and y/t^ respectively. From Lemma [28] we have that 
either (71 1 or (72 1 holds. Assume w.l.o.g. that holds. By 
Lemmal25]we have that 



A[Ai, TiY, A2, 712] ^ A[Ai, TiY, A2, 00] 



Thus, 



A [Ai, TTi; A2, 7T2] ^ A [Ai, Tii; A3, 713] , 
which is equivalent to 

Having finished the "Moreover" part, we now turn our atten- 



tion to the proof of (22 1. The two equalities in (22 1 are straight- 



forward, so we are left with proving the inequality. For A ^ 
and 71 > 0, the following are easily verified: 



and 



C[A,7r] = C[l/A,7r] , 
C [A, n\ increases with A ^ 1 . 



(75) 



(76) 



Also, for A as given in (62 1, Aj, A2 ^ 0, and tti > 0, 712 > 0, 
it is easy to show that 

A[Ai, Ti\, A2, 712] increases with both Ai and A2 . (77) 

Let [Ai, 7ri] and [A2, 712] correspond to y,- and yy, respec- 
tively. Denote 

7 = A[Ai,7ri;A2, 712] 



and 



A[l/Ai, 7Ti;A2, 712] 
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Hence, our task reduces to showing that 

C[7, 7ri + 712] ;j €[^,711 + 712] . (78) 

Assume first that S ^ 1. Recall that both Ai ^ 1 and A2 ^ 1. 



Thus, by (77 1 we conclude that 7 ^ J ^ 1. This, together with 
( [76] l finishes the proof. 

Conversely, assume that i5 ^ 1. Since 

AfAi,7Ti;A2, 712] = A[l/Ai,7ri;l/A2, 712] 



we now get from (77 1 that y ^ S ^ 1. This, together with (75 1 



and ( 76 1 finishes the proof. 



Appendix B 
Proof of Theorem[T3] 

As a preliminary step toward the proof of Theorem [13] we 
convince ourselves that the notation A[Ai; A2, 712/^3] used in 
the theorem is indeed valid. Specifically, the next lemma shows 
that knowledge of the arguments of A indeed suffices to cal- 
culate the difference in capacity. The proof is straightforward. 

Lemma 29: For i = 1,2,3, let y,- and A,-, as well as TI2 be as 
in Theorem 13 If A3 < 00, then 



A[Ai; A2, 712; A3] 



(A3-A2) Alio 



(A2 



l)(Ai-A3) 

1 



(A2-A1) (^A3log2(l 
(A1-A3) (^A2lo 

Otherwise, A3 = 00 and 
A[Ai;A2, 7r2;A3 = 00] 

Til 

A2 + I 



Ai 
1 

"A3 
Y2 



+ log2(l + Ai) 

+ log2(l + A3) 
+ log2(l+A2) 



(79) 



-Ai log2 1 



+ A2log2 1 



1 

aT 
1 

A^ 



log2(l + Ai) 



+ log2(l+A2) 



(80) 



Having the above calculations at hand, we are in a position 



to prove Theorem 1 3 



Proof of Theorem 13 First, let consider the case A3 < 00. 
Since our claim does not involve changing the values of A2 and 
712, let us fix them and denote 

/(Ai,A3) = A[Ai;A2,7r2;A3] . 

Under this notation, it suffices to prove that /(Ai,A3) is de- 
creasing in Aj and increasing in A3, where Ai < A2 < A3. A 
simple calculation shows that 



a/(Ai,A3) 
aAi 



-712 (A3 - A2) 

(1 + A2)(A3-Ai)2 



A3 log 




(81) 



So, in order to show that /(Aj, A3) is decreasing in X\, it suf- 
fices to show that the term inside the square brackets is positive 
for all Ai < A3. Indeed, if we denote 



,(A„A,)=A,l„g(l±4)+l„s(l±|), 



then is readily checked that 

^(Ai,Ai) =0, 

while 

9^(Ai,A3) _ A3 - Ai 



8Ai Ai(Ai + l) 

is positive for A3 > Ai. The proof of /(Ai,A3) increasing in 
A3 is exactly the same, up to a change of variable names. 

Let us now consider the second case, A3 = 00. Similarly 
to what was done before, let us fix A2 and 712, and consider 
A[Ai; A2, 7r2; A3 = 00] as a function of Ai. Denote 

h{Xi) = A[Ai;A2, 7r2;A3 = 00] . 

Under this notation, our aim is to prove that h{Xi) is decreasing 
in Ai. Indeed, 

a/i(Ai) _ -^2iog2 (i + xr) 



aAi 

is easily seen to be negative. 



A2 + I 
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